sentence-level node
Sentence-State LSTM for Text Representation – Arxiv Vanity
Hyperparameters: Table 2 shows the development results of various S-LSTM settings, where Time refers to training time per epoch. Adding one additional sentence-level node as described in Section 3.2 does not lead to accuracy improvements, although the number of parameters and decoding time increase accordingly. As a result, we use only 1 sentence-level node for the remaining experiments. The accuracies of S-LSTM increases as the hidden layer size for each node increases from 100 to 300, but does not further increase when the size increases beyond 300. We fix the hidden size to 300 accordingly.
Sentence-State LSTM for Text Representation
Zhang, Yue, Liu, Qi, Song, Linfeng
Bidirectional LSTMs are a powerful tool for text representation. On the other hand, they have been shown to suffer various limitations due to their sequential nature. We investigate an alternative LSTM structure for encoding text, which consists of a parallel state for each word. Recurrent steps are used to perform local and global information exchange between words simultaneously, rather than incremental reading of a sequence of words. Results on various classification and sequence labelling benchmarks show that the proposed model has strong representation power, giving highly competitive performances compared to stacked BiLSTM models with similar parameter numbers. 1 Introduction Neural models have become the dominant approach in the NLP literature. Compared to handcrafted indicator features, neural sentence representations are less sparse, and more flexible in encoding intricate syntactic and semantic information. Among various neural networks for encoding sentences, bidirectional LSTMs (BiLSTM) (Hochreiter and Schmidhuber, 1997) have been a dominant method, giving state-of-the-art results in language modelling (Sundermeyer et al., 2012), machine translation (Bahdanau et al., 2015), syntactic parsing (Dozat and Manning, 2017) and question answering (Tan et al., 2015). Despite their success, BiLSTMs have been shown to suffer several limitations.